GreenHDFS: Towards An Energy-Conserving, Storage-Efficient, Hybrid Hadoop Compute Cluster
نویسندگان
چکیده
Hadoop Distributed File System (HDFS) presents unique challenges to the existing energy-conservation techniques and makes it hard to scale-down servers. We propose an energy-conserving, hybrid, logical multi-zoned variant of HDFS for managing dataprocessing intensive, commodity Hadoop cluster. Green HDFS’s data-classifica-tion-driven data placement allows scale-down by guaranteeing substantially long periods (several days) of idleness in a subset of servers in the datacenter designated as the Cold Zone. These servers are then transitioned to high-energy-saving, inactive power modes. This is done without impacting the performance of the Hot zone as studies have shown that the servers in the data-intensive compute clusters are under-utilized and, hence, opportunities exist for better consolidation of the workload on the Hot Zone. Analysis of the traces of a Yahoo! Hadoop cluster showed significant heterogeneity in the data’s access patterns which can be used to guide energy-aware data placement policies. The trace-driven simulation results with three-month-long reallife HDFS traces from a Hadoop cluster at Yahoo! show a 26% energy consumption reduction by doing only Cold zone power management. Analytical cost model projects savings of $14.6 million in 3-year total cost of ownership (TCO) and simulation results extrapolate savings of $2.4 million annually when GreenHDFS technique is applied across all Hadoop clusters (amounting to 38000 servers) at Yahoo.
منابع مشابه
Storage Support for Data-Intensive Applications on Extreme-Scale HPC Systems
Many believe that current high-performance computing (HPC) storage systems would not meet the I/O requirement of the emerging exascale computing because of the segregation of compute and storage resources. Indeed, our simulation predicts, quantitatively, that the system availability would go towards zero at exascale. This work proposes a storage architecture with node-local disks for HPC system...
متن کاملHadoop in Low-Power Processors
In our previous work we introduced a so-called Amdahl blade microserver that combines a low-power Atom processor, with a GPU and an SSD to provide a balanced and energy-efficient system. Our preliminary results suggested that the sequential I/O of Amdahl blades can be ten times higher than that a cluster of conventional servers with comparable power consumption. In this paper we investigate the...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملA Cyber-Physical, Data-Centric Cooling Energy Costs Reduction Approach for Big Data Analytics Cloud
Big Data explosion and surge in large-scale Big Data analytics cloud infrastructure have led to burgeoning energy costs and present a challenge to the existing run-time cooling energy management techniques. T ∗GreenHDFS, a thermalaware cloud file system, takes a novel, data-centric approach to reduce cooling energy costs. On the physicalside, T ∗GreenHDFS is cognizant of the uneven thermalprofi...
متن کاملAugmenting MapReduce with Active Volunteer Resources
The migration of interactive workloads, such as desktop applications, into clouds presents significant opportunities for efficiency improvements. The bursty and interactive nature of such workloads makes it challenging to aggressively consolidate them on multi-tenant systems. In such scenarios, utilizing residual or wasted CPU cycles is particularly appealing, which helps amortize the cost of p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010